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Abstract. The best known algorithm to compute the Jacobi symbol 
of two n-bit integers runs in time 0(M(n) log n), using Schonhage's fast 
continued fraction algorithm combined with an identity due to Gauss. We 
give a different 0(M(n) log n) algorithm based on the binary recursive 
gcd algorithm of Stehle and Zimmermann. Our implementation — which 
to our knowledge is the first to run in time 0(M (n) log n) — is faster than 
GMP's quadratic implementation for inputs larger than about 10000 
decimal digits. 



1 Introduction 

We want to compute the Jacobi symbol 3 (b\a) for n-bit integers a 
and b, where a is odd positive. We give three algorithms based on 
the 2-adic gcd from Stehle and Zimmermann [13]. First we give an 
algorithm whose worst-case time bound is 0(M(n)n 2 ) = 0(n 3 ); we 
call this the cubic algorithm although this is pessimistic since the 
algorithm is quadratic on average as shown in [5] , and probably also 
in the worst case. We then show how to reduce the worst-case to 
0(M(n)n) = 0(n 2 ) by combining sequences of "ugly" iterations 
(defined in Section 1.1) into one "harmless" iteration. Finally, we 
obtain an algorithm with worst-case time 0(M(n) logn). This is, 
up to a constant factor, the same as the time bound for the best 
known algorithm, apparently never published in full, but sketched 
in Bach [1] and in more detail in Bach and Shallit [2] (with credit to 
Bachmann [3]). 



3 Notation: we write the Jacobi symbol as (6|o), since this is easier to typeset and less 
ambiguous than the more usual (-). M(n) is the time to multiply n-bit numbers. 
0(/(n)) means 0(f(n) (log f(n)) c ) for some constant c > 0. 



The latter algorithm makes use of the Knuth- Schonhage fast con- 
tinued fraction algorithm [9] and an identity of Gauss [6]. Although 
this algorithm has been attributed to Schonhage, Schonhage him- 
self gives a different 0(M(n) logn) algorithm [10, 15] which does not 
depend on the identity of Gauss. The algorithm is mentioned in 
Schonhage's book [11, §7.2.3], but no details are given there. 

With our algorithm it is not necessary to compute the full con- 
tinued fraction or to use the identity of Gauss for the Jacobi symbol. 
Thus, it provides an alternative that may be easier to implement. 

It is possible to modify some of the other fast GCD algorithms 
considered by Moller [8] to compute the Jacobi symbol, but we do not 
consider such possibilities here. At best they give a small constant 
factor speedup over our algorithm. 

We recall the main identities satisfied by the Jacobi symbol: 
(bc\a) = (b\a)(c\a); (2|a) = (_l)(« 2 -i)/8 ; ( 6 | a ) = (-l)(«-i)(*-i)/4( a |&) 
for a, b odd; and (b\a) = if (a, b) ^ I. 

Note that all our algorithms compute (b\a) with b even positive 
and a odd positive. For the more general case where b is any integer, 
we can reduce to b even and positive using (b\a) = ( — l)( a_1 )/ 2 (— b\a) 
if b is negative, and (b\a) = (b+a\a) if b is odd. 

We first describe a cubic algorithm to compute the Jacobi sym- 
bol. The quadratic algorithm in Section 2 is based on this cubic 
algorithm, and the subquadratic algorithm in Section 3 uses the 
same ideas as the quadratic algorithm but with an asymptotically 
fast recursive implementation. 

For a G Z, the notation v{a) denotes the 2-adic valuation z/ 2 (a) 
of a, that is the maximum k such that 2 k \a, or +oo if a = 0. 

1.1 Binary Division with Positive Quotient 

Throughout the paper we use the binary division with positive quo- 
tient defined by Algorithm 1.1. Compared to the "centered division" 
of [13], it returns a quotient in [1,2 J+1 -1] instead of in [1-2-?, 2 j -l]. 
Note that the quotient q is always odd. 

With this binary division, we define Algorithm CubicBinary- 
Jacobi, where the fact that the quotient q is positive ensures that all 



Algorithm 1.1 BinaryDividePos 

Input: a, b G N with v(a) = < v{b) = j 

Output: q and r = a + gfo/2 j such that < q < 2 j+1 , v(b) < v{r) 

1: q i a/(6/2-') mod 2 j+1 > q is odd and positive 

2: return q,r = a + qb/2 j . 



a,b terms computed remain positive, and a remains odd, thus (b\a) 
remains well-defined. 4 



Algorithm 1.2 CubicBinaryJacobi 

Input: a, b G N with v{a) = < v{b) 
Output: Jacobi symbol (b\a) 

1: s «— 0, j z/(b) 

2: while 2 j a / 6 do 

3: 6' <- b/2 3 

4: (q, r) BinaryDividePos(a, b) 

5: s ^ (s + j(a 2 - l)/8 + (a - - l)/4 + j(&' 2 - l)/8) mod 2 

6: (a,6)^(6',r/2^), j 1/(6) 

7: if a — 1 then return (— l) s else return 



Theorem 1. Algorithm CubicBinaryJacobi is correct (assuming it 
terminates). 

Proof. We prove that the following invariant holds during the algo- 
rithm, if do, &o are the initial values of a, b: 

(b \a ) = (-l) s (%). 

This is true before we enter the while-loop, since s = 0, a = do, and 
b = b . For each step in the while loop, we divide b by 2 J , swap a and 
b' = 6/2 J , replace a by r = a + go', and divide r by 2 J . The Jacobi 
symbol is modified by a factor (— l)^ a _1 )/ 8 for the division of 6 by 
2 j , by a factor (— l)( a ^ 1 )( 6 '" 1 )/ 4 for the interchange of a and 6', and 
by a factor (— l)^ b ' for the division of r by 2 j . At the end of 
the loop, we have gcd(a ,o ) = a; if a = 1, since (6|1) = 1, we have 
(6o]a ) = (— l) s , otherwise (6 |a ) =0. □ 

4 Moller says in [8]: "if one tries to use positive quotients < q < 2 k+1 , the [binary 
gcd] algorithm no longer terminates". However, with a modified stopping criterion 
as in Algorithm CubicBinaryJacobi, the algorithm terminates (we prove this below). 



Lemma 1. The quantity a+26 is non-increasing in Algorithm Cubic- 
BinaryJacobi. 



Proof. At each iteration of the "while" loop, a becomes 6/2 J , and b 
becomes (a + qb/2^)/2K In matrix notation 

a\ . ( 1/2' \ /a 



Therefore a + 26 becomes 



(2) 



Since j > 1, the first term is bounded by a. In the second term, 
q < 2-? +1 - 1, thus the second term is bounded by (5/2-? - 2/2 2j ')6, 
which is bounded by 96/8 for j > 2, and equals 2b for j = 1. n 

If j > 2, then a + 2b is multiplied by a factor at most 9/16. If 
j — q — 1 then a + 2b decreases, but by a factor which could be 
arbitrarily close to 1. The only case where a + 2b does not decrease 
is when j = 1 and q = 3; in this case a + 26 is unchanged. 

This motivates us to define three classes of iterations: good, bad, 
and ugly. Let us say that we have a good iteration when j > 2, a 
bad iteration when j — q — 1, and an ugfo/ iteration when j = 1 
and g = 3. Since g is odd and 1 < q < 2 j+1 — 1, this covers all 
possibilities. For a bad iteration, (a, 6) becomes (6/2, a/2 + 6/4), and 
for an ugly iteration, (a, 6) becomes (6/2, a/2 + 36/4). We denote the 
matrices corresponding to good, bad and ugly iterations by G, B 
and U respectively. Thus 

/ 1/2A / 1/2 \ / 1/2 

- ^".9 - 1 /OJ « //I J ' ^ ~ I 1 /O 1 //I ) ' ^ 



1/2 1/4 ) ' V 1/2 3/4 



The effect of m successive ugly iterations is easily seen to be given 
by the matrix 

U m = -( l + 4 (" 1 /4) m 2 - 2(-l/4)™\ 

' 5 V2-2(-l/4) m 4+ (-1/4)" 1 y ' U 



Assume we start from (a, 6) = (ao,&o), and after m > successive 
ugly iterations we get values (a m ,b m ). Then, from Equation (3), 



We can not have 2a = 6 or the algorithm would have terminated. 
However, a m must be an integer. This gives an upper bound on 
m. For ao, 60 of n bits, the number of successive ugly iterations is 
bounded by n/2 + 0(1) (a precise statement is made in Lemma 2). 

If there were no bad iterations, this would prove that for n-bit 
inputs the number of iterations is 0(n 2 ), since each sequence of ugly 
iterations would be followed by at least one good iteration. Bad 
iterations can be handled by a more complicated argument which we 
omit, since they will be considered in detail in §2 when we discuss the 
complexity of the quadratic algorithm (see the proof of Theorem 2). 

Since the number of iterations is 0(n 2 ) from Theorem 2, and 
each iteration costs time 0(M(n)), the overall time for Algorithm 
CubicBinaryJacobi is 0(n 2 M(n)) = 0(n 3 ). Note that this worst- 
case bound is almost certainly too pessimistic (see §4). 

2 A Provably Quadratic Algorithm 

Suppose we have a sequence of m > ugly iterations. It is possible 
to combine the m ugly iterations into one harmless iteration which 
is not much more expensive than a normal (good or bad) iteration. 
Also, it is possible to predict the maximal such m in advance. Us- 
ing this trick, we reduce the number of iterations (good, bad and 
harmless) to 0(n) and their cost to 0(M(n)n) = 0(n 2 ). 

Without loss of generality, suppose that we start from (ao, bo) = 
(a, b). Since a is odd, we never have a = 2b. 

Lemma 2. If fx — v(a — 6/2), then we have exactly |_/x/2j ugly iter- 
ations starting from (a, 6), followed by a good iteration if fx is even, 
and by a bad iteration if fx is odd. 

Proof. We prove the lemma by induction on fx. If fx = 0, a — 6/2 
is odd, but a is odd, so 6/2 is even, which yields j > 2 in Binary- 
DividePos, thus a, 6 yield a good iteration. If fx — 1, a — 6/2 is even, 



5a m = (a + 26) + 2(2a - 6)(-l/4) m , 
56 m = 2(a + 26) - (2a - 6)(-l/4) m . 



(4) 
(5) 



which implies that 6/2 is odd, thus we have j — 1. If we had q = 3 
in BinaryDividePos, this would mean that a + 3(6/2) = mod 4, or 
equivalently a — 6/2 = mod 4, which is incompatible with /i — 1. 
Thus we have 5 = 1, and a bad iteration. 

Now assume fi > 2. The first iteration is ugly since 4 divides a — 
6/2, which implies that 6/2 is odd. Thus j = 1, and a — 6/2 = mod 4 
implies that q = 3. After one ugly iteration (a, 6) becomes (6/2, a/2 + 
36/4), thus a — 6/2 becomes —(a — 6/2)/4, and the 2-valuation of 
a — 6/2 decreases by 2. □ 

From the above, we see that, for a sequence of m ugly iterations, 
a , a±, . . . , a m satisfy the three-term recurrence 



and similarly for 6 , 6 1? . . . , b m . It follows that = a mod 4, and 
similarly 6j = 6 mod 4, for 1 < i < m. 

We can modify Algorithm CubicBinaryJacobi to consolidate m 
consecutive ugly iterations into one harmless iteration, using the 
expressions (4)-(5) for a m and b m (we give an optimised evaluation 
below). It remains to modify step 5 of CubicBinaryJacobi to take 
account of the m updates to s. Since j = 1 for each ugly iteration, 
we have to increment s by an amount 



where we write 6^ for 6j/2. However, a^+i = 6- for < i < m, so the 
terms involving division by 8 "collapse" mod 2, leaving just the first 
and last terms. The terms involving two divisions by 2 are all equal 
to (a — l)/2 • (6' — l)/2 mod 2, using the observation that a,i mod 4 
is constant for < i < m. Thus 



One further simplification is possible. Since = a>i mod 4, and ao is 
odd, we can replace a\ by ao in the last term, and use the fact that 



4a i+ i — 3aj — aj_i = for < % < m, 





x 2 = x mod 2 to obtain 





We can economise the computation of a m and b m from (4)-(5) by 
first computing 

d = a - b', m = v{d) div 2, c = (d - (-l) m (d/4 m ))/5, 

where the divisions by 4 m and by 5 are exact; then a m = a — 4c, 
b m = b + 2c. 

From these observations, it is easy to modify Algorithm Cubic- 
BinaryJacobi to obtain Algorithm QuadraticBinaryJacobi. In this 
algorithm, steps 7-11 implement a harmless iteration equivalent to 
m > consecutive ugly iterations; steps 13-14 implement bad and 
good iterations, and the remaining steps are common to both. Step 5 
of Algorithm CubicBinaryJacobi is split into three steps 4, 13 and 15. 
In the case of a harmless iteration, the computation of 5 satisfying (6) 
is implicit in steps 4, 10 and 15. 



Algorithm 2.1 QuadraticBinaryJacobi 

Input: a, b G N with v(a) = < v(b) 
Output: Jacobi symbol (b\a) 

1: s «— 0, j «— v(b) 

2: while 2 3 a / b do 

3: b' <r- b/2 3 

4: s •(- (s + j(a 2 - 1) /8) mod 2 

5: (g, r) «— BinaryDividePos(a, 6) 

6: if (j,q) = (1,3) then 

7: d^a-b' 

8: m <- ^(d) div 2 

9: c <- (d- (-l) m d/4 m )/5 

10: s <- (s + m(a - l)/2) mod 2 

11: (a, 6) <- (a - 4c, 6 + 2c) 

12: else 

13: s <- (s + (a - 1)(6' - 1) /4) mod 2 

14: (a, 6) <- (&',r/2 J ) 

15: s^(s+ j(a 2 - 1) /8) mod 2, j v(b) 
16: if a = 1 then return (— l) s else return 



t> harmless iteration 



> good or bad iteration 



Theorem 2. Algorithm QuadraticBinaryJacobi is correct and ter- 
minates after O {n) iterations of the "while" loop (steps 2-15) if the 
inputs are positive integers of at most n bits, with = v{a) < v(b). 



Proof. Correctness follows from the equivalence to Algorithm Cubic- 
Binary Jacobi. To prove that convergence takes 0(n) iterations, we 
show that a + 2b is multiplied by a factor at most 5/8 in each block 
of three iterations. This is true if the block includes at least one good 
iteration, so we need only consider harmless and bad iterations. Two 
harmless iterations do not occur in succession, so the block must 
include either (harmless, bad) or (bad, bad). In the first case, the 
corresponding matrix is BU m = BU ■ U" 1 ^ 1 for some m > 0. We 
saw in §1.1 that the matrix U leaves a + 26 unchanged, so U m ~ l also 
leaves a + 2b unchanged, and we need only consider the effect of BU. 
Suppose that (a, b) is transformed into (a, b) by BU. Thus 



The case of two successive bad iterations is similar - just replace BU 
by B 2 in the above, and deduce that a + 2b < (a + 2b)/ 2. 

We conclude that the number of iterations of the while loop is at 
most cn + 0(1), where c = 3/ log 2 (8/5) « 4.4243. n 

Remarks 

1. A more complicated argument along similar lines can reduce the 
constant c to 2/ \og 2 (l/ p(BU)) = 2/log 2 ((ll - v / 57)/2) « 2.5424. 
Here p denotes the spectral radius: p(A) = lim^oo | \A k \ 

2. In practice QuadraticBinaryJacobi is not much (if any) faster 
than CubicBinaryJacobi. Its advantage is simply the better worst- 
case time bound. A heuristic argument suggests that on average only 
1/4 of the iterations of CubicBinaryJacobi are ugly. 

3. Our implementations of CubicBinaryJacobi and QuadraticBinary- 
Jacobi are slower than GMP's 0(n 2 ) algorithm (which is based on 
Stein's binary gcd, as in Shallit and Sorenson [12]). However, in the 
next section we use the ideas of our QuadraticBinaryJacobi algo- 
rithm to get an 0(M(n) log n) algorithm. We do not see how to 
modify the algorithm of Shallit and Sorenson to do this. 5 

5 In Algorithm Binary Jacobi in [12], it is necessary to know the sign of a — n (b — a in 
our notation) to decide whether to perform an interchange. This makes it difficult 
to construct a recursive 0(M(n) logn) algorithm like Algorithm HalfBinaryJacobi. 




We see that 



~ ~ a 5b 5, „ IS 
a + 2b=- + — < -(a + 26). 
2 4 ~ 8 



3 An 0(M(n) log n) Algorithm 



Algorithm HalfBinaryJacobi is a modification of Algorithm Half-GB- 
gcd from [13]. The main differences are the following: 

1. binary division with positive (not centered) quotient is used; 

2. the algorithm returns an integer s such that if a, b are the inputs, 
c,d the output values defined by Theorem 3, then 

(b\a) = (-iy(d\c); 

3. at steps 4 and 27, we reduce mod 2 2fcl+2 (resp. 2 2fe2+2 ) instead of 
mod 2 2fcl+1 (resp. 2 2fe2+1 ), so that we have enough information to 
correctly update s at steps 10, 17, 21 and 25; 

4. we have to "cut" some harmless iterations in two (step 15). 

Remarks. The matrix Q occurring at step 19 is just 2 2m U m , where 
U m is given by Equation (3). Similarly, the matrix Q occurring at 
step 23 is 2 2jo Gj 0tq . In practice, steps 13-20 can be omitted (so the al- 
gorithm becomes a fast version of CubicBinaryJacobi) - this variant 
is simpler and slightly faster on average. 

Theorem 3. Let a, b, k be the inputs of Algorithm HalfBinaryJacobi, 
and s,j,R the corresponding outputs. If (^) = 2~ 2 iR then: 

(b\a) = (-l) s (d\c) and u(2 j c) <k< v(2?d). 

Proof (outline). We prove the theorem by induction on the param- 
eter k. The key ingredient is that if we reduce a, b mod 2 2fcl+1 in 
step 4, then the GB sequence of ai,&i matches that of a, b, for the 
terms computed by the recursive call at step 5. This is a consequence 
of [13, Lemma 7] (which also holds for binary division with positive 
quotient). It follows that in all the binary divisions with inputs a^, 6« 
in that recursive call, cij and fej/2- 7 ' match modulo 2 Jl+1 the corre- 
sponding values that would be obtained from the full inputs a, b 
(otherwise the corresponding binary quotient qi would be wrong). 
Since here we reduce a, b mod 2 2fcl+2 instead of mod 2 2fcl+1 , Oj and 
6j/2 Ji now match modulo 2 Ji+2 — instead of modulo 2 J4+1 — the val- 
ues that would be obtained from the full inputs a, b, where 2 Ji+2 > 8 
since ji > 1. 



Algorithm 3.1 HalfBinaryJacobi 



Input: a G N, b G N U {0} with = v(a) < v(b), and k G N 
Output: two integers s, j and a 2 x 2 matrix R 
1: if v(b) > k then 

2: Return 0, 0, ( * ° ) 



> b — is possible 



3: fei <- Lfc/2J 

4: oi «- a mod 2 2fcl+2 , bi <- 6 mod 2 2fcl+2 

5: si, ji, i? HalfBinaryJacobi (oi, bi, fci) 

6: a' «- 2~ 231 (fli,i<j + Ri, 2 b), b' <- 2~ 2jl (_R 2 ,ia + J?2, 2 b) 

7: jo <- i/(6') 

8: if jo + ji > k then 

9: Return si, ji, R 

10: s <- jo(a' 2 - l)/8 mod 2 

11: q,r 4— BinaryDividePos(a', b') 

12: b" 4- b'/2 M 

13: if (jo, q) = (1,3) then 

14: d «- a' - b" 

15: m min(^(<i) div 2, fc — ji) 

16: c<- (d- (-l) m d/4 m )/5 

17: s «- so + m(a' - 1) /2 mod 2 

18: (a2, b 2 ) {a — 4c, 2(b" + c)) t> harmless iteration 

/ (4™ + 4(-l) m )/5 2(4'" - (-1D/5 A 

20: else 

21: so «- so + (a' - 1)(6" - l)/4 mod 2 

22: (o 2 , b 2 ) <- (b", r/2 J0 ) > good or bad iteration 



24: m <s— jo 

25: s <- s + jo (a! - l)/8 mod 2 

26: k 2 <- k- (m + ji) 

27: s 2 , j 2 , 5 <- HalfBinaryJacobi(a 2 mod 2 2fe2+2 , b 2 mod 2 2fc2+2 , k 2 ) 

28: Return (so + si + s 2 ) mod 2, ji + j 2 + m, S x Q x R 




At step 10, s depends only on j mod 2 and a' mod 8, at step 17 
it depends on m mod 2 and a' mod 4, and at step 21 on a' mod 4 
and b" mod 4. Since a' and 6" at step 21 correspond to some a; and 
it follows that a' and 6" agree mod 8 with the values that 
would be computed from the full inputs, and thus the correction so 
is correct. This proves by induction that (b\a) = (— l) s (d\c). 

Now we prove that v{2?c) < k < v{2?d\ If there is no harmless 
iteration, v{2?c) < k < u(2 j d) is a consequence of the proof of Theo- 
rem 1 in [13]. In case there is a harmless iteration, first assume that 
m = u(d) div 2 at step 15. The new values 02, &2 at step 18 corre- 
spond to m successive ugly iterations, which yield j — ji + m < k. 
Thus v(2?d2) < k: we did not go too far, and since we are computing 
the same sequence of quotients as Algorithm QuadraticBinaryJa- 
cobi, the result follows. Now if k — ji < u(d) div 2, we would go too 
far if we performed u(d) div 2 ugly iterations, since it would give 
jo := u(d) div 2 > k — j±, thus j '■— ji + jo > k, and z/(2 J, a 2 ) would 
exceed k. This is the reason why we "cut" the harmless iteration at 
m — k — ji (step 15). The other invariants are unchanged. □ 

Finally we can present our 0(M(n) logn) Algorithm FastBinary- 
Jacobi, which computes the Jacobi symbol by calling Algorithm Half- 
BinaryJacobi. The general structure is similar to that described in [8] 
for several asymptotically fast GCD algorithms. 



Algorithm 3.2 FastBinaryJacobi 

Input: a, b G N with = v(a) < v(b) 
Output: Jacobi symbol (b\a) 

1: s «— 0, j u{b) 

2: while 2 3 a / b do 

3: fc<-max(i/(!)), 1(b) div 3) 

4: s',j, R HalfBinaryJacobi(a, b, k) 

5: s ^— (s + s') mod 2 

6: (a, 6) «- 2-' 23 (R 1A a + R li2 b,R2,ia + R 2 , 2 b), j 
7: if a = 1 then return ( — l) s else return 



> £(b) is length of b in bits 



■1/(6) 



Daireaux, Maume-Deschamps and Vallee [5] prove that, for the 
positive binary division, the average increase of the most significant 
bits is 0.65 bits/iteration (which partly cancels an average decrease 



of two least significant bits per iteration); compare this with only 
0.05 bits/iteration on average for the centered division. 6 

4 Experimental Results 

We have implemented the different algorithms in C (using 64-bit 
integers) and in GMP (using multiple-precision integers), as well as 
in Maple/Magma (for testing purposes). 

For max(a, b) < 2 26 the maximum number of iterations of Algo- 
rithm CubicBinaryJacobi is 64, with a = 15548029 and b = 66067306. 
The number of iterations seems to be 0(n) for a,b < 2 n : see Table 1. 
This is plausible because, from heuristic probabilistic arguments, we 
expect about half of the iterations to be good, and experiments con- 
firm this. For example, if we consider all admissible a,b < 2 20 , the 
cumulated number of iterations is 3.585 x 10 12 for 2 38 calls, i.e., an 
average of 13.04 iterations per call (max 48); the cumulated number 
of good, bad and ugly iterations is 51.78%, 25.47%, and 22.75% re- 
spectively. For a,b < 2 60 , a random sample of 10 8 pairs (a, b) gave 
42.72 iterations per call (max 89), with 50.54%, 25.14%, and 24.31% 
for good, bad and ugly respectively. These ratios seem to be con- 
verging to the heuristically expected 1/2 = 50%, 1/4 = 25%, and 
1/4 = 25%. 

When we consider all admissible a,b < 2 20 , the maximum number 
of iterations of QuadraticBinaryJacobi is 37 when a = 933531, b = 
869894, the cumulated number of iterations is 3.405 x 10 12 (12.39 per 
call), the cumulated number of good, bad and harmless iterations is 
54.51%, 26.82%, and 18.67% respectively. For a, b < 2 60 , a random 
sample of 10 8 pairs (a, b) gave 40.21 iterations per call (max 76), with 
53.70%, 26.71%, and 19.59% for good, bad and harmless respectively. 
These ratios seem to be converging to the heuristically expected 
8/15 = 53.33%, 4/15 = 26.67%, and 1/5 = 20%. 

We have also compared the time and average number of itera- 
tions for huge numbers, using the fast gcd algorithm in GMP, say 
gcd — which implements the algorithm from [8] - - and an imple- 
mentation of the algorithm from [13], say bgcd. For inputs of one 



We have computed more accurate values of these constants: 0.651993 and 0.048857 
respectively. 



million 64-bit words, gcd takes about 45.8s on a 2.83Ghz Core 2, 
while bgcd takes about 48.3s and 32,800,000 iterations: this is in ac- 
cordance with the fact proven in [5] that each step of the binary gcd 
discards on average two least significant bits, and adds on average 
about 0.05 most significant bits. Our algorithm bjacobi (based on 
Algorithms 3.1-3.2) takes about 83.1s and 47,500,000 iterations (for 
a version with steps 13-20 of Algorithm 3.1 omitted in the basecase 
routine), which agrees with the theoretical drift of 0.651993 bits per 
iteration. The break-even point between the 0(n 2 ) implementation 
of the Jacobi symbol in GMP 4.3.1 and our 0(M(n)\ogn) imple- 
mentation is about 535 words, that is about 34, 240 bits or about 
10,300 decimal digits (see Fig. 1). 

5 Concluding Remarks 

Weilert [15] says: "We are not able to use a GCD calculation in Z[i] 
similar to the binary GCD algorithm ■ ■ ■ because we do not get a 
corresponding quotient sequence in an obvious manner" . In a sense 
we filled that gap for the computation of the Jacobi symbol, because 
we showed how it can be computed using a binary GCD algorithm 
without the need for a quotient sequence. 

We showed how to compute the Jacobi symbol with an asymp- 
totically fast time bound, using a binary GCD algorithm without the 
need for a quotient sequence. Our implementation is faster than a 
good 0{n 2 ) implementation for numbers with bitsize n > 35000. Our 
subquadratic implementation is available from http://www. loria. 
f r/~zimmerma/sof tware/#jacobi. 

Binary division with a centered quotient does not seem to give a 
subquadratic algorithm; however we can use it with the "cubic" algo- 
rithm (which then becomes provably quadratic) since then we control 
the sign of a, b. For a better quadratic algorithm, we can choose the 
quotient q so that abq < 0, by replacing q by q — 2 J+1 if necessary: 
experimentally, this gains on average 2.194231 bits per iteration, 
compared to 1.951143 for the centered quotient, and 1.348008 for the 
positive quotient. In comparison, Stein's "binary" algorithm gains on 
average 1.416488 bits per iteration [4, §7] [7, §4.5.2]. 
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Table 1. Worst cases for CubicBinaryJacobi(fe|a), max(a, 6) < 2™. 
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Fig. 1. Comparison of CMP 4.3.1 mpz_jacobi routine with our FastBinaryJacobi 
implementation in log-log scale. The rr-axis is in 64-bit words, the t/-axis in milliseconds 
on a 2.83Ghz Core 2. 



